A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
نویسندگان
چکیده
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., 2009; Ross and Bagnell, 2010) provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
منابع مشابه
Reinforcement and Imitation Learning via Interactive No-Regret Learning
Recent work has demonstrated that problems– particularly imitation learning and structured prediction– where a learner’s predictions influence the inputdistribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of acti...
متن کاملNo-Regret Methods for Learning Sequential Predictions Thesis Proposal
Sequential prediction problems arise commonly in many areas of robotics and information processing. For instance, in robot navigation tasks, autonomous robots rely on the ability to make a sequence of actions, given a sequence of observations revealed to them over time, in order to reach the desired goal location. Similarly, complex information processing tasks, such as structured prediction pr...
متن کاملHC-Search: Learning Heuristics and Cost Functions for Structured Prediction
Structured prediction is the problem of learning a function from structured inputs to structured outputs. Inspired by the recent successes of search-based structured prediction, we introduce a new framework for structured prediction called HC-Search. Given a structured input, the framework uses a search procedure guided by a learned heuristic H to uncover high quality candidate outputs and then...
متن کاملLearning to Search: Structured Prediction Techniques for Imitation Learning
Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, timeconsuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behavior is clear and easily demonstrated. Inspired by successful end-to-end learning systems such as ...
متن کاملMulti-Armed Bandits on Unit Interval Graphs
An online learning problem with side information on the similarity and dissimilarity across different actions is considered. The problem is formulated as a stochastic multiarmed bandit problem with a graph-structured learning space. Each node in the graph represents an arm in the bandit problem and an edge between two nodes represents closeness in their mean rewards. It is shown that the result...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011